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Abstract.  A  controlled  switching  diffusion  model  is  developed  to  study  the  hierarchical 
control  of  flexible  manufacturing  systems.  The  existence  of  a  homogeneous  Markov  non- 
randomized  optimal  policy  is  established  by  a  convex  analytic  method.  Using  the  existence  ol 
such  a  policy,  the  existence  of  a  unique  solution  in  a  certain  class  to  the  associated  Hamilton- 
Jacobi-Bellman  equations  is  established  and  the  optimal  policy  is  characterized  as  a  minimizing 
selector  of  an  appropriate  Hamiltonian. 


1.  Introduction 

We  study  a  controlled  switching  diffusion  process  that  arises  in  numerous  applications 
of  systems  with  multiple  modes  or  failure  modes,  including  the  hierarchical  control  of  flex¬ 
ible  manufacturing  systems.  A  flexible  manufacturing  system  (FMS)  consists  of  a  set  of 
workstations  capable  of  performing  a  number  of  different  operations  and  interconnected  by 
a  transportation  mechanism.  An  FMS  produces  a  family  of  parts  related  by  similar  oper¬ 
ational  requirements  or  by  belonging  to  the  same  final  assembly  [27].  The  rapidly  growing 
range  of  applicability  of  FMS  includes  metal  cutting,  assembly  of  printed  circuit  boards, 
integrated  circuit  fabrication,  automobile  assembly  lines,  etc.  Due  to  their  tremendous  flex¬ 
ibility,  FMS  are  significantly  more  efficient  in  many  ways  than  traditional  manufacturing 
systems.  However,  the  high  capital  cost  of  an  FMS  demands  very  efficient  management  of 
production  and  maintenance  (repair/replacement)  scheduling  so  that  uncertain  events  such 
as  random  demand  fluctuations,  machine  failures,  inventory  spoilages,  sales  returns,  etc. 

AMS(MOS)  subject  classifications:  93E20,  60J70. 

Key  words:  Flexible  manufacturing  system,  Wiener  process,  switching  diffusion,  Poisson  measure,  Markov 
policy,  dynamic  programming  equations. 

f  Systems  Research  Center,  2167  A.V.  Williams  Bldg.,  University  of  Maryland,  College  Park,  MD  20742. 
This  research  was  supported  in  part  by  the  Texas  Advanced  Research  Program  (Advanced  Technology 
Program)  under  Grant  No.  003658-093,  in  part  by  the  Air  Force  Office  of  Scientific  Research  under  Grant 
AFOSR-91-0033,  in  part  by  the  National  Science  Foundation  under  Grant  ECS-S617860,  and  in  part  by  the 
Air  Force  Office  of  Scientific  Research  (AFSC)  under  Contract  F49620-89-C-0044. 


1 


can  be  taken  care  of.  The  large  size  of  the  system  and  its  associated  complexities  make  it 
imperative  to  divide  the  control  or  management  into  a  hierarchy  consisting  of  a  number  of 
levels.  Thus  the  overall  complex  problem  is  reduced  to  a  number  of  managable  subproblems 
at  each  level,  and  these  levels  are  linked  by  means  of  a  hierarchical  integrative  system.  We 
refer  to  [1],  [21],  [27]  for  a  detailed  description  of  these  hierarchical  schemes.  We  will  confine 
our  attention  to  the  top  two  levels,  viz. 

(i)  Generation  of  decision  tables,  which  is  accomplished  by  developing  a  suitable  math¬ 
ematical  model  describing  the  dynamical  evolution  of  the  system.  This  is  done 
off-line. 

(ii)  The  flow  control  level:  This  plays  the  central  role  in  the  system.  This  determines  on 
line  the  production  and  maintenance  scheduling  and  continuously  feeds  the  routing 
control  level  which  calculates  route  splits,  and  which  in  turn  governs  the  sequence 
controller  which  determines  scheduling  times  at  which  to  dispatch  parts. 

Since  the  top  two  levels  directly  govern  the  rest,  it  is  of  paramount  importance  to  de¬ 
velop  and  study  an  appropriate  mathematical  model  which  will  facilitate  to  find  on  line 
implementable  optimal  feedback  policies. 

We  first  present  a  heuristic  description  of  our  model,  which  is  a  modified  version  of  the 
model  in  [1],  [21],  [27].  The  FMS  consists  of  L  workstations,  with  each  workstation  having 
a  number  Lm  of  identical  machines  (to  =  1, 2, . . . ,  L).  A  family  of  N  types  of  different  parts 
is  produced.  Let  u(t)  —  . . . ,  «jv(f)]r  £  IRiV  and  d(t)  =  [di(f), . . . ,  d/v(t)]r  G  iRA 

denote  the  production  rate  (a  control  variable)  and  the  downstream  demand  rate  vectors 
of  this  family  of  parts,  respectively.  Also,  X(t)  =  [Xi(i), . . . ,  A'/v(i)]T  €  denotes  the 
downstream  buffer  stock.  A  negative  value  of  Xj(t),  j  =  1 , ...  ,1V,  indicates  a  backlogged 
demand  for  part  j,  while  a  positive  value  is  the  size  of  the  inventory  stored  in  the  buffers. 
The  evolution  of  X(t)  is  governed  by  the  following  stochastic  differential  equations 


=  u(t)  —  d(t)  -f  diag(c7-i , . . . ,  <7N)£(t)  (1-1) 

where  <7;  >  0,  i  =  1, . . . ,  N  and  £(f)  =  . . .  ,6v(f)]T  is  an  TV-dimensional  white  noise 

which  can  be  interpreted  as  “sales  returns”,  “inventory  spoilage”,  “sudden  demand  fluctu¬ 
ations,”  etc.  (see  [8]). 

If  Sm(t )  denotes  the  number  of  operational  machines  in  station  m  at  time  t ,  then  the 
state  of  the  workstations  may  be  represented  by  the  Z-tuple  S(t)  =  . . .  ,<?£,(<)) .  The 

evolution  of  5(f)  is  influenced  by  the  inventory  size  and  production  scheduling,  and  can  also 
be  controlled  by  various  decisions  such  as  produce,  repair,  replace,  etc.  The  dynamics  of 
S(t)  can  be  described  as  follows: 


P {Sm(t  -f-  St )  —  i  +  1  Sm(t)  —  ^  } 


(Lm  -  l)vm(t)  St  +  o(St) 

0 


for  0  <  l  <  Lm 

,  (1-2) 

otherwise 


where  um(f),  m  —  1, . . . ,  L  are  suitable  control  variables.  In  the  uncontrolled  case,  vm(t)  — 
7m,  which  represents  the  infinitesimal  repair  rate  at  station  to.  These  repair  rates  may 
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implicitly  depend  on  X(t).  This  model  also  allows  for  a  control  variable  reflecting  the 
decision  as  to  whether  to  repair  or  replace  on  the  basis  of  the  inventory  size.  Also, 


P{sm(t  +  St)  =  i-  1  I  5m(f)  =  *}  = 


tpm(X(t),u(t))6t  +  o(6t)  for  0  <  l  <  Lm 
0  otherwise 


(1.3) 


where  pm  models  the  infinitesimal  failure  rate  at  the  mth  station.  Equations  (1.2)  and  (1.3) 
imply  that 

P{Sm(t  +  8t)  =  £i  \  Sm(t)  =  l2}  =  0,  for  \li~l2\>l. 

With  i  and  j  denoting  two  states  of  the  system,  we  define 


A +  o(St)  =  P{S(t  +  6t)  =  j  |  S(t)  =  i],  i  ±  j 


and 

mo  =  -!>,•(•). 

j&i 

The  macldne  state  5(f)  can  thus  be  modeled  as  a  continuous  time  controlled  jump  process 
taking  values  in  a  finite  state  space.  In  the  uncontrolled  case,  5(f)  becomes  a  continuous 
time  homogeneous  Markov  chain  with  the  infinitesimal  generator  given  by  the  matrix  [A^-] . 

The  choice  of  the  production  rate  at  each  instant  is  constrained  by  the  capacity  of  the 
currently  operational  machines.  This  translates  into  the  requirement  that  at  each  time  t 
the  production  rates  must  lie  in  some  set  T(5(f))  which  depends  on  the  machine  state. 

Let  Vmn(t)  be  the  number  of  type  n  parts  which  undergo  operation  k  at  the  mth  station 
per  unit  interval  of  time  and  T^n(t)  the  length  of  time  required  for  the  completion  of  this 
operation.  The  product  ymiSf)Tmnif)  is  tire  portion  of  each  unit  time  interval  that  one  or 
more  operational  machines  at  station  m  must  dedicate  to  perform  operation  k  on  type  n 
parts,  as  dictated  by  the  flow  rate  2/^n(f).  Since  the  amount  of  work  completed  at  each 
station  per  unit  time  interval  cannot  exceed  the  time  available  at  the  operational  machines, 
the  following  constraint  applies 


Y  Y  y™n{t)Tin{t)  <  Sm(t)  ,  for  all  771.  (1.4) 

n  k 

Also,  assuming  that  no  material  is  allowed  to  accumulate  within  the  system,  the  throughput 
un(t)  of  type  n  parts  must  satisfy 

«.(<)  =  £  vL  (t) ,  for  all  k  and  n.  (1.5) 


Therefore,  for  each  state  i  the  set  T(i)  is  defined  as  the  collection  of  all  production  rates 
«  =  ,un]t  for  which,  with  the  machine  state  S{t )  =  i,  there  exist  feasible  flow  rates 

Vtni1)  satisfying  (1.4)  and  (1.5). 

The  flow  control  problem  can  now  be  stated.  Given  an  initial  buffer  state  A1(0)  = 
x  and  machine  state  5(0)  =  i,  we  wish  to  specify  a  production  plan  and  maintenance 
(repair/replacement)  policy  that  minimizes  the  performance  index 


J( 


x,  i,  u,v)  =  E 


e~atc(X(t),  S(t),u(t),v(t))  dt  X(0)  =  *,5(0)  =  * 


(1.6) 
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where  c(-)  is  a  ‘cost’  function,  a  >  0  is  a  discount  factor,  «(•)  is  the  production  rate,  and  «(•) 
is  the  maintenance  rate.  The  objective  is  to  find  u(-),  v(  ■)  for  which  the  minimum  is  achieved 
in  (1.6).  The  ideal  production  and  maintenance  policy  for  a  wide  class  of  cost  functions 
would  minimize  J  by  producing  parts  at  exactly  the  demand  rate,  thereby  keeping  the 
buffer  at  zero.  Such  a  policy  is  generally  impossible  because  of  the  failures  of  the  machines 
and  various  other  uncertainties. 

This  FMS  model  motivates  the  study  of  a  stochastic  optimization  problem  in  a  more 
abstract  setting  which  will  subsume  the  flow  control  problem  in  the  FMS  as  a  special  case. 
This  abstract  problem  is  manifested  in  numerous  other  situations.  In  [17]  it  is  encountered  in 
a  hybrid  model  proposed  for  the  study  of  dynamic  phenomena  in  large  scale  interconnected 
power  networks.  Sworder  [39],  [40]  describes  possible  applications  to  macroeconomic  models 
and  dynamic  renewal  problems  in  general.  In  addition,  it  should  be  useful  at  other  levels 
of  the  hierarchy  described  in  [21]. 

We  will  briefly  describe  this  problem  formally;  a  rigorous  description  will  be  given  in  the 
next  section.  Let  5  =  {1,2,...,  M }  and  let  Ui,  i  =  1, . . . ,  M,  be  prescribed  compact  metric 
spaces.  For  each  i,j  G  S ,  let  &(•,  •,  i,  •)  :  M+  x  1RN  xUi  — >  1RN  and  A  jj  :  x  1RN  xUi  — +  JR, 

satisfying  A ij  >  0  for  i  ^  j  and  Ajj(-)  =  0.  A  stochastic  process  (X(f),  5(f))  taking 
values  in  JRN  X  <S  is  given  by 

X(t)  =  X(0)  +  f  b(r,X(T),S(T),u(r))  dr  +  diag(<ri,..  .,aN)W(t)  (1.7) 

Jo 

P{5(f  +  St)  =  j  |  5(f)  -  i,X(s),S(s),s  <  t}  =  A ij(i,X(t),ti(<))«  +  o(«)  (1.8) 

where  cr;  >  0,  i  =  1, . . .  ,N  are  constants,  W(-)  =  [Wi(-), . . . ,  W)v(-)]T  is  an  N- dimensional 
standard  Brownian  motion.  The  control  «(•)  is  a  U  :=  [Qili  ^-valued  process  such  that 
when  5(f)  =  i,  u(-)  takes  values  in  Ui  and  u(-)  is  non- anti cipative  with  respect  to  the  driving 
Brownian  motion  W{t).  Let  c  :  M+  X  JRN  X  S  x  U  JR+  be  the  cost  function  and  a  >  0 
a  prescribed  discount  factor.  Define  a  cost  functional  of  the  form 


E 


e  atc(t,X(t),  S(t),u(t))  dt 


(1.9) 


The  objective  is  to  find  an  optimal  control  policy  «(•)  which  minimizes  (1.9)  and  is  in  the 
feedback  form  u(t)  —  v(t,X(t),S(t))  for  a  suitably  defined  function  v.  In  the  next  section 
we  will  assume  appropriate  conditions  on  b  and  A  which  will  guarantee  that  (1.7),  (1.8) 
are  well  defined.  We  note  here  that  for  a  performance  index  of  the  form  (1.9),  m,  A,c  may 
be  assumed  to  be  independent  of  t  without  any  loss  of  generality.  Also,  by  replacing  each 
Uk  by  nfli  Uk  and  &(•,  i,  •)  by  its  composition  with  the  projection  n*=i  U k  —*■  Ui  we  may 
assume  that  each  Ui  is  a  replica  of  a  fixed  compact  metric  space. 

We  now  briefly  mention  some  earlier  work  leading  to  ours.  The  class  of  controlled  piece- 
wise  deterministic  models  with  jump  Markov  disturbances  have  been  studied  by  Sworder 
[38],  Rishel  [35],  Olsder  and  Suri  [33],  Davis  [19],  Vermes  [42]  among  many  others.  The 
piecewise  deterministic  FMS  model  has  been  studied  by  Kimenia  and  Gershwin  [27],  who 
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have  developed  a  heuristic  numerical  method  based  on  the  maximum  principle  established 
in  [35].  Akella  and  Kumar  [2]  have  studied  a  simplified  model  and  obtained  explicit  solu¬ 
tions  for  one  machine  producing  a  single  commodity.  In  all  these  papers  the  jump  process 
is  modeled  as  a  continuous  time  (uncontrolled)  Markov  chain.  Boukas  and  Haurie  [14],  [15] 
have  modified  the  FMS  model  of  Kimenia  and  Gershwin  by  introducing  new  state  variables 
describing  machine  wear  as  well  as  a  control  parameter  in  the  jump  process;  their  model 
incorporates  preventive  maintenance.  They  have  obtained  a  maximum  principle,  thereby 
extending  the  Rishel  formalism  in  [35].  They  have  also  considered  piecewise  deterministic 
models.  To  obtain  an  optimal  policy  of  the  feedback  type  in  these  models  one  has  to  put 
very  strong  conditions  on  terms  like  6,  A  governing  the  system  and  stringent  restrictions  on 
the  set  of  allowable  policies.  At  the  same  time  it  is  assumed  in  these  models  that  between 
two  successive  jumps  of  S(t),  the  dynamics  governing  X(t)  is  deterministic.  Thus  certain 
unavoidable  ‘environmental’  uncertainties  are  not  taken  into  account.  These  factors  restrict 
the  scope  of  applicability  of  these  models.  We  have  tried  to  circumvent  these  difficulties  by 
adding  an  additive  noise  term  in  the  state  dynamics.  This  is  specifically  done  in  order  to 
take  into  account  the  various  sources  of  environmental  randomness.  Addition  of  this  noise 
removes  practically  all  restrictions  imposed  on  the  set  of  allowable  control  policies,  thereby 
substantially  enhancing  the  range  of  its  applicability.  The  switching  diffusion  problem  has 
also  been  studied  by  Bensoussan  and  Lions  [7],  using  a  martingale  problem  formulation. 
However,  our  motivation  and  approach  are  quite  different.  In  [7],  it  is  assumed  that  for 
a  6  >  0,  -Xu  >  6  >  0  for  each  i.  We  have,  instead,  used  a  strong  formulation  which  is 
very  important  for  practical  applications.  In  our  formulation  we  do  not  need  the  condition 
-Xu  >  S  >  0.  We  also  refer  to  [6],  [8],  [9],  [16],  [20],  [32],  [36],  [37],  [43]  for  related  work. 

Our  paper  is  structured  as  follows.  A  rigorous  description  of  the  mathematical  model  of 
the  FMS  is  given  in  Section  2.  The  optimization  problem  is  formulated  and  subsequently 
reduced  to  an  equivalent  convex  optimization  problem,  via  the  study  of  associated  occu¬ 
pation  measures.  The  compactness  of  laws  is  established  in  Section  3,  the  convexity  and 
extremality  of  occupation  measures  are  studied  in  Section  4,  and  the  proof  of  existence  of 
optimal  policies  is  given  in  Section  5.  Section  6  deals  with  the  characterization  of  opti¬ 
mal  policies  via  dynamic  programming  equations.  In  Section  7,  we  apply  our  theory  to  a 
simplified  model  and  derive  some  interesting  results.  Finally,  Section  8  contains  some  con¬ 
cluding  remarks.  We  note  that  we  have  used  a  convex  analytic  approach  for  this  problem, 
as  opposed  to  the  traditional  analytic  one.  For  the  discounted  cost  criterion  the  latter  is 
more  economical  and  is  therefore  sketched  in  the  appendix.  However,  the  convex  analytic 
approach  is  interesting  in  its  own  right  and  would  be  more  flexible  and  powerful  for  cer¬ 
tain  other  purposes,  e.g.  pathwise  average  cost  problem  or  problem  with  several  constraints 
where  the  analytic  approach  does  not  seem  to  be  amenable.  For  (nonswitching)  controlled 
diffusions  these  problems  have  been  treated  in  [11,  Chap.  VI]  and  [12]  by  the  convex  analytic 
approach.  We  hope  our  approach  to  switching  diffusions  would  be  useful  in  various  other 
situations. 
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2.  Mathematical  Model  and  Preliminaries 

Let  U  be  a  compact  metric  space  and  S  =  {1, . . .  ,M}.  Let  b  =  [&i, . . .  ,6jv]T  :  ZRA  x  S  X 
i7  — >  jRjV.  For  each  i  6  <S,  £>(•,*,•)  is  assumed  to  be  bounded,  continuous  and  Lipschitz  in 
its  first  argument  uniformly  with  respect  to  the  third.  For  i,j  €  S,  let  A :  JRN  X  U  —  M 
be  bounded,  continuous  and  Lipschitz  in  its  first  argument  uniformly  with  respect  to  the 
second.  Also,  assume  that  for  i,  j  €  <S,  i  j,  \ij  >  0,  and  Xij  ~  0  f°r  any  *  £  S. 

Let  <7,  >  0,  i  =  1,2, . . . ,  N  be  prescribed  numbers.  For  a  Polish  space  Y,  58 (Y)  will  denote 
its  Borel  cr-field  and  V(Y)  the  space  of  probability  measures  on  5B(Y)  endowed  with  the 
Prohorov  topology,  i.e.  the  topology  of  weak  convergence  [10].  Let  9Jt(Y)  be  the  set  of  all 
non- negative  integer- valued,  er-finite  measures  on  5B(Y).  Let  be  the  smallest  cr-field 

on  Wl(Y)  with  respect  to  which  all  maps  from  9Jt(Y)  into  IN  U  {oo}  of  the  form  fi  > — fx(B) 
with  B  £  5B(Y),  are  measurable.  28(Y)  will  always  be  assumed  to  be  endowed  with  this 
measurability  structure.  Let  V  =  V(U)  and  b  =  [&i, . . .  ,&jv]T  :  MN  X  S  X  V  —  1RN  be 
defined  by 

bi(-,-,v):=  [  bi(-,-,u)v(du),  veV,  i=l,...,N.  (2.1) 

Ju 

Similarly  for  i,j  €  S,  A ,-j  :  JRN  x  V  — *  1R  is  defined  as 

A  ij(-,v):=  f  A  ij(-,u)v(du) ,  veV,  i,jeS.  (2.2) 

Ju 

For  i,j  €  S,  x  €  MN ,  v  6  V,  we  construct  the  intervals  A ij(x,v)  of  the  real  line  in  the 
following  manner  (see  also  [13],  [17]): 

A12(ar,v)  =  [0,A12(x,  v)) 

A13(x,v)  =  [A12(ar,v),A12(x,u)  +  Ai3(*,t>)) 


rM- 1 


A1M{x,v) 


A2i(x,v)  = 


M 


Y  Al i(x’v)>YXli(x’v') 

'  3= 2  i=2 

■  M  M 

Y  Ali(^r  V),  Y  Alr(X;  V)  +  A2l(^, 

7=2  J=2 


A2m(x,u)  = 


r  M 


M — 1  M 


M 


5ZAij(x,t>)+  Y  xv(x’v)’YXii(x’v 

lj-2  j—1  j~2 

&  2 


)  +  X  A2l(CC> 

i=i 

2 


and  so  on.  For  fixed  x  and  v,  these  are  disjoint  intervals,  and  the  length  of  A ij(x,v)  is 
A ij(x,v).  Now  define  a  function 

h  :  Mn  x  S  x  V  x  M  — >  M 
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by 


/i(cE,  1,  V ,  z)  = 


j-i  if  z  e  Aij(x,v) 
0  otherwise. 


(2.3) 


Let  (X(i),S(f))  be  the  ( MN  x<S)-valued  controlled  switching  diffusion  process  given  by  the 
following  stochastic  differential  equations 


dX(t)  =  6(X(/),5(/),w(/))  dt  +  diag(a1, . . .  ,<tjv)  dW(t) 
dS(t )  =  p (dt,dz) 


for  t  >  0  with  X(0)  =  X0  and  5(0)  =  So  where 

(i)  Xo  is  a  prescribed  IRiV-valued  random  variable. 

(ii)  5o  is  a  prescribed  5-valued  random  variable. 

(iii)  W(-)  —  [W\ (•),..., lf/Ar(-)],r  is  an  tV-dimensional  standard  Wiener  process  indepen¬ 
dent  of  Xo,  So- 

(iv)  p(dt,dz)  is  an  VJl(IR+  X  IR)- valued  Poisson  random  measure  with  intensity  dt  x 
m(dz)1  where  m  is  the  Lebesgue  measure  on  IR  ([25,  p.  70]). 

(v)  p(- ,  •)  and  W(-)  are  independent. 

(vi)  v(-)  is  a  V-valued  process  with  measurable  sample  paths  satisfying  the  following 
non-anticipativity  property.  Let  37  =  cr{-y(s)  :  s  <  t), 

=  °{W(s)  -  W(t ),  p(A,2?)  :  A  €  ®([s,oo)),  B  £  <B(7R),  s  >  t}. 

Then  37  and  are  independent. 

Such  a  process  v(-)  will  be  called  an  admissible  (control)  policy.  If  u(-)  is  a  Dirac  mea¬ 
sure,  i.e.  v(-)  =  <$„(.),  where  u(-)  is  a  [/-valued  process,  then  it  is  called  an  admissible 
non-randomized  policy.  An  admissible  policy  u(-)  is  called  feedback  if  v(-)  is  progressively 
measurable  with  respect  to  the  natural  filtration  of  (X(-),5(-)).  A  particular  subclass  of 
feedback  policies  is  of  special  interest.  A  feedback  policy  v(-)  is  called  a  (non-homogeneous) 
Markov  policy  if  v(-)  =  {;(•,  X(-),  5(-))  for  a  measurable  map  v  :  iR+  X  IRN  X  S  —*■  V. 
With  an  abuse  of  notation  the  map  v  itself  is  called  a  Markov  policy.  If  v  has  no  explicit 
time  dependence,  it  is  called  a  homogeneous  Markov  policy.  Thus  a  homogeneous  Markov 
non-randomized  policy  can  be  identified  with  a  measurable  map  v  :  JRN  X  S  —>  U . 

If  (W(-),p(-, -),Xo,  5b,t?(-))  satisfying  the  above  are  given  on  a  prescribed  probability 
space  (f 1,3", -P),  then  under  our  assumptions  on  b  and  A,  equation  (2.4)  will  admit  an 
a.s.  unique  strong  solution  [22,  Chap.  3],  [25,  Chap.  3,  Sect.  2c]  and  X(-)  €  C(IR+;/RjV), 
S(- )  £  D(1R+;S),  where  D(IR+;S)  is  the  space  of  right  continuous  functions  on  M+  with 
left  limits  taking  values  in  S.  However,  if  ?;(•)  is  a  feedback  policy,  then  there  exists  a 
measurable  map 

f:JR+x  C(M+;Mn)  X  D(JR+;S )  — »  V 

such  that  for  each  t  >  0,  v(t)  =  /(f,  X(-),  5(-))  and  is  measurable  with  respect  to  the  cr-fleld 
generated  by  {X(s),5(,s)  :  s  <  /}.  Thus  u(-)  cannot  be  specified  a  priori  in  (2.4).  Instead, 
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one  has  to  replace  v(t)  in  (2.4)  by  /(i,  A'(-),  5(-))  and  (2.4)  takes  the  form 


dX(t)  -  b(X(t),S(t),f(t,X(-),S(-))]  dt  +  diag(cri,. .  ,,aN)dW(t) 
dS(t)  =  h(X(t),S(t-)J(t,X(-),S(-)),z)p(dt,dz) 

for  t  >  0  with  X(0)  =  A'o  and  5(0)  =  Sq.  In  general,  (2.5)  will  not  even  admit  a  weak 
solution.  However,  if  the  feedback  policy  is  a  Markov  policy, then  the  existence  of  a  unique 
strong  solution  can  be  established.  We  now  introduce  some  notation  which  will  be  used 
throughout.  Define 


L\MN  x  5)  =  {/  :  IRn  X  S  — +  M  :  for  each  i  G  G  L\lRN)}. 

L1(Mn  X  5)  is  endowed  with  the  product  topology  of  (i1(IR'v))7  .  Similarly,  we  define 
Cq°(Mn  x  5),  Wg£(MN  x  5),  etc.  For  /  G  W^PC{1RN  x  5),  u  G  U  we  write 

M 

Luf(x,i)  =  L?f(x,i)  +  A ij(x,u)[f(ai,j)  -  f(x,i)\  (2.6) 

j= i 


where 


r?  /(«,  0  =  i  E  oj  +  E  W.  *.  ' 0 

j=i  J 


JV 


and  more  generally,  for  v  G  V, 


Lvf(x,i)=  [  Lu  f{x,i)v(du). 

Ju 


(2.7) 


(2.8) 


Theorem  2.1.  Under  a  Markov  policy  v,  (2.4)  admits  an  a.s.  unique  strong  solution  such 
that  (X(-),  5(-))  is  a  strong  Feller  process  with  extended  generator  Lv . 

Proof  (Sketch).  This  proof  is  based  on  the  technique  involving  the  removal  of  drift  [41],  [10, 
Thm.  1.4,  pp.  10-12].  Clearly  it  suffices  to  prove  the  result  in  the  interval  [0,T]  for  a  fixed 
T  >  0.  For  T  >  0  let  H  be  the  function  space  defined  by 

H  =  {g  G  Wjfc'p([0,T]  X  MN  X  5),  2  <  p  <  oo  :  for  each  i  G  5,  sup^  \g(t,x,i)\ 

grows  slower  than  exp(fc||x||2)  for  all  k  >  0}.  (2.9) 


(2.10) 


Fix  an  i  £  S.  For  1  <  j  <  N,  let  <pi(t,x,j )  be  the  unique  solution  in  H  (as  in  (2.9))  of 

Mlili  +  =  0 

ipi(T,x,j)  =  Xi , 

where  x  =  (xi , . . . ,  x n).  Let  (p  =  \p>\ ,  —  ,  <£>;v]T-  It  can  be  shown  that  for  fixed  j ,  ip(t,  -  ,j)  is 
a  homeomorphism  onto  its  range  for  each  t  G  [0,T].  Set  Y(t)  =  ip(t,X(t),S(t)),  t  G  [0,T]. 
Using  Ito’s  formula,  it  follows  that  Y (t)  satisfies 

y(i)  =  F(0)+  f  \(D(psdiag(o1,. . .  ,oN))  °  pf1  ( Y(s))dW(s ) 

Jo  L 

H  Ivsi^iYis-V  +  hi^iYis-^z^-Yis^pidsJz)  (2.11) 
o  Jjrl  v 


+ 
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where  Dips,  <psl  denote  respectively  the  Jacobian  matrix  and  the  inverse  map  of 
<p(s,  •,  5(s)) ,  ‘o’  indicates  composition  of  functions  and 


h(-,-,-)  =  [o,o,...,o,h(.,.,.)]T  eiR 


N+ 1 


Now  by  [41],  equation  (2.11)  has  an  a.s.  unique  strong  solution  which  is  a  Markov  process. 
The  corresponding  claim  for  (X(t),5(f))  follows  via  the  homeomorphic  property  of  ip.  It 
remains  to  show  the  strong  Feller  property.  Pick  any  bounded  measurable  function  /  : 
MN  XS  —y  ]R.  The  system  of  equations 


drp(t,x,i) 

dt 


+  Lv(t'x'i]Tl>{t,x,i)  =  0 


i>(T,x,i)  =  /(*,*) 


(2.12) 


can  be  shown  to  have  a  unique  solution  in  H  [18].  Therefore,  by  Ito’s  formula  it  follows 
that 

=  E[f{X(T),S(T))  |  X(t)  =  x,S(t)  =  t] 

where  the  expectation  is  under  the  Markov  policy  v.  By  Sobolev’s  imbedding  theorem  [5, 
p.  53],  H  C  C([0,T]  x  ]Rn  x  5)  and  hence  is  continuous  for  each  t  G  [0,T],  □ 

Some  comments  are  in  order  now. 


Remark  2.1. 

(i)  We  have  used  Ito’s  formula  for  functions  in  Wjfc’p(lR+  X  MN  X  S).  This  general¬ 
ization  is  due  to  Krylov  [28,  pp.  121-127]  for  “classical”  diffusions.  Its  extension 
for  the  present  system  is  routine. 

(ii)  The  well-posedness  of  the  Cauchy  problem  for  the  weakly  coupled  parabolic  system 
(2.10)  has  been  established  in  [18]  under  slightly  stronger  conditions  on  the  first 
order  terms.  However,  in  view  of  the  results  in  [3],  [30,  Chap.  7],  its  extension  to 
the  present  case  is  straightforward. 


Remark  2.2.  We  have  seen  in  Theorem  2.1  that  under  a  Markov  policy  the  corresponding 
solution  (X(-),5(*))  of  (2.4)  is  a  Markov  process.  We  have  the  following  converse  result. 
Let  u(-)  be  a  feedback  policy,  such  that  the  corresponding  solution  (X(-),5(-))  of  (2.4)  is  a 
Markov  process.  Then  u(-)  may  be  taken  to  be  a  Markov  policy.  Since  we  do  not  need  this 
result,  we  omit  the  proof. 


The  Optimization  Problem. 

Let  c  :  ]Rn  X  S  X  U  — »  1R+  be  a  bounded,  continuous  cost  function,  and  let  c  :  1RN  x 
S  X  V  — >  JR+  be  defined  as 

=  /  c(-,-,u)v(du) . 

Ju 

Let  a  >  0  be  a  prescribed  discount  factor.  Let  v(-)  be  an  admissible  policy  and  (X(‘),  S(-)j 
the  corresponding  process.  Then  the  total  a-discounted  cost  under  v(- )  is  defined  as 


Jv(x, i)  E 


e~atc(X(t),S(t),v(t))  dt 


X(0)  =  a;,  5(0)  =  i 


(2.13) 
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If  the  laws  of  Xq,Sq  are  x  6  'P{MN),  £  6  'P(S)  respectively,  then 


M*,o  =  T,  l 


Jv(x,i )  7r  (dx)£(i). 


(2.14) 


Let 

F(a;,i)  :=  inf{j„(x,i)},  (2.15) 

v(-) 

V(x,0  :=  inf{j„(7r,^)}.  (2.16) 

v(-) 

The  function  V(x,i)  is  called  the  (o-discounted)  value  function.  An  admissible  policy  u(-) 
satisfying 

J„(  7T,0  =  V(7T,0 


is  called  an  optimal  policy  for  the  initial  law  (x,£).  An  admissible  policy  is  called  optimal 
if  it  is  optimal  for  any  initial  law.  Our  aim  is  to  find  an  admissible  optimal  policy  which  is 
homogeneous  Markov  and  non-randomized. 

We  now  introduce  (discounted)  occupation  measures  [12].  Let  v(-)  be  an  admissible  policy 
and  (X(-),S(-)')  the  corresponding  process  with  initial  law  (x,f).  Define  the  occupation 
measure  j/[x,£;u]  €  T(1RN  x  S  x  17)  by 


f  dv[ x,£;  v]  =  aE 


f 


IJo 


-at 


f  f(X(t),S(t),u)v(t)(du)dt 

Ju 


(2.17) 


for  /  €  Cb(MN  x  S  x.  U).  Let 


Mi[x,£]  =  {z/[x,£;v]  :  v(-)  is  admissible}  (2.18) 

M2 [?r , £]  =  :  u(-)  is  homogeneous  Markov}  (2.19) 

M3[x,£]  =  (i/[x,f;i>]  :  u(-)  is  homogeneous  non-randomized  Markov}  (2.20) 

In  terms  of  these  occupation  measures 

Jv{n,Q  =  a-1  J  cdv[n,£]v].  (2.21) 

We  will  show  in  the  following  sections  that  Mj[x,f]  =  M2[x,£]  and  that  AE [x ,  £]  is  compact, 
convex  and  M2e[x,£]  C  M3[x,£],  where  Mf[x,  £]  is  the  set  of  extreme  points  of  M2  [x,£]  ,  Thus 
for  a  fixed  initial  law,  the  optimization  problem  (2.13)  will  reduce  to  a  convex  optimization 
problem  in  view  of  (2.21). 


3.  Compactness  of  Laws 

We  will  establish  the  compactness  of  laws  of  (X(-),  S(-))  under  various  policies  us¬ 
ing  the  approach  in  [11,  Chap.  2],  Let  xo  €  V(1RN),  £0  €  'P(S).  Let  £,[x0, £0]  C 
"P(C(iR+;  1RjV)  X  D(M+-,S)) ,  i  =  1,2,3,  denote  the  set  of  laws  of  (X(-),5(-))  under  all 
admissible/Markov/homogeneous  Markov  policies  with  fixed  initial  law  (x0,£o)- 
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Theorem  3.1.  The  set  £i[7r0,£o]  js  compact  in  V  (C(M+;  1RN)  x  D(JR+-S)). 

Proof.  It  clearly  suffices  to  replace  JR+  by  [0,T]  for  arbitrary  T  >  0.  Fix  T  >  0. 
Let  (Xn(-),  5n(-),  Wn(-),pn(-,  •),«”(•),  Xft,  Sq)  ,  n  >  1,  satisfy  (2.4)  on  probability  spaces 
(f ln,$n,Pn)  respectively,  the  laws  of  Xft ,  Sq  being  7r0,  £o  respectively  for  all  n.  Let  {/;} 
be  a  countable  dense  subset  of  the  unit  ball  of  C(U).  Define  /3™(t)  =  f  fj  dvn(t ),  t  £  [0,T]. 
Let  B  denote  closed  unit  ball  of  Z°°[0,T]  with  the  topology  given  by  the  weak  topology  of 
Z2[0,T]  relativized  to  B.  Let  LI  be  a  countable  product  of  replicas  of  B .  Since  B  is  compact 
and  metrizable  and  hence  Polish,  the  same  follows  for  E.  Let  /?"(•)  =  [/?]*(•), •)>•••  ]> 
n  >  1,  viewed  as  E- valued  random  variables.  Using  the  assumed  conditions  on  6,  it  can  be 
easily  shown  that  for  fi,f2  €  [0,T], 


E 


|Xn(f2)-X"(f1)||4 


<  K | to  -  t\ |2 


for  some  T-dependent  K  >  0.  It  follows  that  the  laws  of  the  sequence  {Xn(-)}  are 
tight  in  T’(C(M+;1Rn)).  Since  S  is  finite  and  E  is  compact,  it  follows  by  Prohorov’s 
theorem  [24,  Thm.  2.6,  p.  7]  that,  for  A\  £  <B(iR+),  A2  £  2$(1R)  fixed,  the  sequence 
{Xn(-),Sn(-),Pn(-),Wn(-),pn(A1  x  A2))  converges  to  a  limit  (X(-),  S(-),  /?(•),  W(-),  p(Ax  X 
A2))  •  Dropping  to  a  subsequence  if  necessary  and  invoking  Skorohod’s  theorem  [24,  p.  9],  we 
may  assume  that  all  these  random  variables  are  defined  on  a  common  probability  space  and 
the  convergence  is  a.s.  on  this  probability  space.  By  [11,  Lemma  II. 1.2,  p.  24]  we  can  find  a 
V-valued  process  v(-)  such  that  /3*(f)  =  J  fi  dv(t),  i  >  1.  Define  Zn(-)  =  [Z”(-), . . . ,  Z^(-)]T, 
>'*(•)•  n  >  1.  by 


z?(t)  =  x?(t)  -  [  bi  {Xn(sf  Sn(s),  vn(s))  ds,  t>  0 
Jo 

M  rt 

Yn(t)  =  5"(t)  -  22  /  ^sn{s-),j  (^n(5)j  u”(5))  (i  -  Sn(s-))  ds ,  t  >  0 

i  Jo 


and  Z(-)  =  [Z1(-),...,ZN(-)]T,Y(-)  by 

Zi(t)  =  Xi(t)  -  [  bi  (X(s),  S(s),  v(s))  ds ,  t>  0 

Jo 

M  t 

Y (t)  =  5(f)  -  22  /  *S(-)J  (x(s)’  v(*))  (j  -  S(s~))  ds >  *  >  0. 

j=i  Jo 

Then  by  [11,  Lemma  II. 1.3,  p.  26]  and  standard  representation  theorems  for  semimartingales 
[25,  pp.  172-178]  applied  to  Zi(t)  and  Y(t),  it  follows  that  on  an  augmented  probability 
space  (X(-),  5(-))  satisfies  (2.4)  for  an  admissible  policy  v(-)  and  driven  by  a  Wiener  process 
W(-)  and  a  Poisson  random  measure  p(-,-).  □ 

We  now  state  the  next  theorem  without  proof  as  it  would  be  almost  identical  to  the  proof 
of  [11,  Thm.  II. 2.1,  p.  29]  in  view  of  the  estimates  in  [30,  p.  582], 


li 


Theorem  3.2.  The  sets  £2  [^o  >£()]>  ^3 [^o ■> ^0 ]  are  compact. 

Let  {vn}  be  a  sequence  of  homogeneous  Markov  policies  and  (A'n(-),  Sn(-))  the  cor¬ 
responding  solutions  of  (2.4)  with  Xn(0)  =  Xo,  5n(0)  =  io  for  all  n  >  0.  Let 
pn(t,x0,io,y,j)  be  the  fundamental  solutions  corresponding  to  the  operators  (J^  4-  LVn). 
Let  (X”(-),  5n(-))  — *  (X°°(-),  5'co(-)) ,  where  the  latter  is  governed  by  a  homogeneous 
Markov  policy  v^.  Then  using  the  Holder  estimates  on  pn(t,  xo,  io,  y,  j)  [30,  p.  582]  one  can 
show  the  following  result  as  in  [10,  Thm.  II.2.2,  p.  33]. 

Lemma  3.1.  For  each  t  >  0,  pn(t,  x0,  i0, -) — ►  p°°(t,  xo,  io,  •)  in  L1(iRA  x  S).  In  other 
words,  the  laws  of  (Xn(t),Sn(t))  converge  to  that  of  (X°°(t),  in  total  variation. 

Following  [11,  p.  30],  we  topologize  the  space  of  all  homogeneous  Markov  policies.  Let 
F  =  {u  :  Mn  X  S  — *  V  :  v  is  measurable}. 

Topologize  F  as  in  [11,  p.  30].  Then  F  is  a  compact  metric  space.  Its  topology  is  determined 
by  the  following  convergence  criterion  [11,  Lemma  II.2.1,  p.  32]. 

Lemma  3.2.  Let  f  G  L2(lRlV  x  S)C\L1(MN  x  S),  g  G  Cb{dRN  x  S  x  U)  and  vn  —  v  in  F. 
Then 

/  /(*,*)  /  g(x,i,-)dvn(x,i)dx  — »  /  /  g(x,i,-)  dv(x,  i)  dx  (3.1) 

JjRN  Js  n—+co  jmN  js 

for  each  i  G  S.  Conversely  if  (3.1)  holds  for  all  such  f,g  and  i  G  S,  then  vn  — >  v  in  F. 

Let  jC(v)  denote  the  law  of  (X(-),  5(-))  when  X(0)  =  Xo,  5(0)  =  io  and  the  homogeneous 
Markov  policy  v  is  used.  Using  Lemma  3.1,  the  following  theorem  can  be  proved  exactly 
the  same  way  as  in  [11,  Thm.  II. 2. 3,  p.  34]. 

Theorem  3.3.  The  map  v  1 — *•  £(v)  from  F  into  V (C(M+;  MN)  X  D(jR+;5))  is  continu¬ 
ous. 


4.  Convexity  and  Extremality  of  Occupation  Measures 

In  this  section  we  will  study  the  properties  of  the  occupation  measures  v[7r,£;u]  intro¬ 
duced  in  (2.17),  following  the  approach  in  [12]. 

Lemma  4.1.  The  sets  Mj[x,£],  M2[x,£],  £]  as  defined  in  (2.18)-(2.20)  are  compact. 

Proof.  This  follows  from  Theorems  3.1  and  3.2.  □ 

Lemma  4.2.  For  each  fixed  initial  law  =  M2[tt,£]. 

Proof.  Let  v(x,Cv]  G  M\.  Disintegrate  it  as 

u[n,Civ](dx  X  {i}  X  du )  =  F[7r,£;  v](dx  X  {*})  v(x,  i)(du)  (4.1) 

where  T7[7t,£;  v]  is  the  marginal  of  i/[ 7r,£;  v]  on  MN  X  S  and  v(x,i)  is  a  version  of  the  regular 
conditional  law  defined  v[ir ,  £;u]  a.s.  Pick  any  version  from  this  equivalence  class  and  keep 
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it  fixed  henceforth.  The  map  v(-,  -)  obviously  defines  a  homogeneous  Markov  policy.  Let 
(X'(-),  be  the  solution  of  (2.4)  with  v(- )  replaced  by  v'(-)  =  v(X'(-),  S'(-))  and  with 
initial  law  (tt,£).  Let  /  G  Cb(MN  X  <S  X  U)  and  let 


¥>0M)  =  E 


-at 


U 


f(X'(t),  S'(t),  u )  v'(t)(du )  dt  X'(0)  =  x,  S'(0)  =  i 


(4.2) 


Using  the  strong  Markov  property  of  (X'(-), 5'(-))  (this  follows  from  Feller  property)  and 
the  local  solvability  of  weakly  coupled  systems  of  elliptic  equations  [31,  Chap.  7,  p.  388]  it 
can  be  shown  by  employing  standard  arguments  involving  Ito’s  formula  that  is  the 

unique  solution  in  W^(MN  x  £)  fl  Cb(MN  x  5),  2  <  p  <  oo,  to 

Lv^x,t^(p(x,  i)  —  aip(x,i)  +  l  f(x,i,u)v(x,i)(du)  =  0.  (4.3) 

Ju 


Define  a  process  Y (•)  by 


Then 


Y(t) 


u 


$f(X(s),S(s),u)v(s)(du)ds  +  e  at p (X (t) ,  S (t)) . 


E[Y(t)\  -  E[Y  (0)]  =  E[Y(t)}  -  £  /  <p(xj)  ir(dx)  £(J) 

UJrn 

(*  e-as  \Lv(s)v^s)t  S{s))  _  a<p(X(s),  S(s)) 

n  *- 


+ 


J  f  (X(s),  S(s),u)  v(s)(du) 


ds 


(4.4) 


Letting  t  — >■  oo  and  using  the  definition  of  *;(•,•)  and  (4.3),  it  follows  that  the  right-hand 
side  in  (4.4)  tends  to  zero  (cf.  [11,  Thm.  4.2,  pp.  40-42]).  Thus 


Urn  E[Y(t)}  =  E[v(X0,So)} 

t—+  OO 

=  25[V«,SS)]. 


Since  /  €  Cb(MN  X  S  X  U)  was  arbitrary,  it  follows  that  v[k,£\v]  =  i/[7r,  u],  □ 

Let  i/[7r,£;  v]  G  By  a  routine  extension  of  the  inequality  [28,  p.  66]  it  follows  that 

T>[n,£;v]  (as  in  (4.1))  is  absolutely  continuous  with  respect  to  the  product  of  the  Lebesgue 
measure  on  1RN  and  the  counting  measure  on  S  and  therefore  has  a  density  <^[7r,  £;  u].  Let 
v[ 7T,£;  v]  be  the  marginal  of  X' [tt ,  £;  u]  on  S.  With  ‘supp’  denoting  the  support  of  a  measure, 
let 

supp(ri[7r,f;u])  =  Si [?r,£;  v]  C  S.  (4.5) 

It  is  not  difficult  to  see  that  <^[7t,£;  v](ar,  i)  >  0  a.e.  x  G  MN ,  i  G  v]  and 

<£>[7T,£;  u](a:,  z)  =  0  for  i  G  S  \  5i[7r,£;u].  For  /  G  W]fc{lRN  X  <S)  define 


L"af(x,i)  =  L^f(x,i)-af(x,i). 


(4.6) 
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Then,  tp[ 7r,£;u]  is  the  unique  solution  in  L1(lRlV  x  S)  to: 


L$x,t)g(x,i)(p(x,i)dx  =  - 


g(x,i)n(dx)£(i) 


<p(x,  i )  dx 


1, 


¥>(*,*')  >  0, 


(4.7) 


for  every  <7  €  Cq°(Mn  X  5).  Using  the  above,  we  will  show  that  Mof",  £]  is  convex. 
Lemma  4.3.  The  set  A/2[7r,£]  is  convex. 

Proof.  Let  v\,  vo  be  two  homogeneous  Markov  policies  and  0  <  a  <  1.  Define  a  homogeneous 
Markov  policy  by 

_  a<p[7r,^;v1](a,t>i(a,i)  +  (1  -  v2](x,  i)vi(x,  i)  ^  ^ 

«¥>[*■,  f;vi  KM)  +  (i  - 

for  (x,i)  G  Mn  X  {5i[x,£; vi]USi[7t,£;vo]}  and  arbitrary  otherwise.  Let  /  G  Cq°(1R n  xS). 
It  is  easy  to  see  that 


aMb^MCM^q^/CM)  +  (1  -  a)M> £;  v2](s,  i)X«  ' (X'V  f(x.  i) 
«V>Kf;M(M)  +  (1  -  a)y>[7r,f;  v2](x,  i) 


V2(x,i) 


Let  ip(x,i )  =  ayj[7r,^;  va](a:,i)  +  (1  -  a)<^(y,£;  v2](x,i).  From  (4.7)  and  (4.8)  it  follows  that 
Lp  =  ¥>[7T,£;  v].  Thus 


v[tt ,  £]v](dx  x  {z}  x  du) 

=  y>[7r,  £;  v](x,  i )  da;  v(a;,  i)(du) 

=  a^[7r,  £;  ui)(»,i)  dx  v\ (x,i)(du)  +  (1  —  a)yj[7r,£;  v2](x,  i)  dx  v2(x,i)(du) 

—  (az/frr,  £;  iq]  +  (1  —  o)v[tc,£\  t?2])(dx  x  {«}  x  du).  □ 


Let 

‘Z[k,£\  =  {F[x,^;u]  G  T{MN  X  S )  :  ^[tt,  ra]  G  -Ma[7r,£]}  (4.9) 

where  77[iv,£-,v]  is  as  in  (4.1). 

The  proof  of  the  next  lemma  is  analogous  to  that  of  [12,  Lemma  3.2].  We  present  a  brief 
sketch  describing  the  essential  ideas. 

Lemma  4.4.  The  set  X[7t,£]  is  compact  in  V{IRN  X  S)  in  total  variation. 

Proof.  By  a  routine  extension  of  the  inequality  [28,  p.  66]  to  the  present  case,  (p[~,  £;  v] 
will  be  uniformly  bounded  in  LP(JRN).  For  the  sake  of  convenience,  assume  that  the  initial 
condition  is  (ar0,*o)  €  1RN  X  As  in  [11,  Lemma  5.2,  p.  44]  we  can  show  by  considering 
appropriate  estimates  on  the  weakly  coupled  systems  of  elliptic  equations  [31,  Chap.  7]  that 
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for  any  bounded  open  set  A  such  that  A  C  \  {2:0}  and  i  £  S  \  {io}  there  exists  a  ,3  >  0 
and  a  K  £  (0,oo)  such  that 

\Lp[x0,i0;v\(y,i)  -  <p[x0,i0;v](z,i)\  <  K\\y  -  z\\0 ,  y,z  £  A ,  (4.10) 

under  any  choice  of  a  homogeneous  Markov  policy  v.  By  Theorem  3.2,  X[xo,io]  is  compact 
in  the  Prohorov  topology  of  T(1RN  X  5).  Let  {P^o^'o;^]}  be  a  sequence  in  T[x0,  io]  and 
v[xo,  io]  Woo]  a  weak  limit  point  of  {p^o, ioWn]}-  We  need  to  show  that  <^[*o,  io;  r„]  — - 
(^[2:0,  *o ;  uoo]  in  L1(MN  X  S).  The  equi continuity  of  {<^[*0,  io!  wn]}  follows  from  (4.10).  Also 
(4.10)  together  with  the  uniform  Lp-estimates  implies  pointwise  boundedness.  Thus  by  the 
Arzela-Ascoli  theorem  we  may  drop  to  a  subsequence  if  necessary  to  conclude  that  for  each 
i  £  S, 

<p[*o,io, vn](-,i)  — > 

for  some  uniformly  on  compact  subsets  of  MN .  By  the  uniform  Xp-estimates  the 

convergence  is  also  in  T1(1RA).  Thus 

/  (p[x0,i0;vn\(y,i)f(y)dy  — >  /  xf(y ,  i)  f  (y)  dy ,  (4.11) 

J rN  n— 00  J rh 

for  all  /  €  Cb(HlN).  But  (4.11)  certainly  holds  with  (p[xo,  io',  Vco](q  0  replacing  ip(-,  i). 
Therefore,  <p[xo,  io',  Woo]  =  D 

We  are  now  in  a  position  to  characterize  the  extreme  points  of  M2[7r,£].  Let  v  be  a 
homogeneous  Markov  policy  such  that  for  each  x  £  ]RN ,  i  £  S 

v(x,i)  —  avi(x,i)  -f  (1  —  a)v2(x,i)  (4-12) 

where  a  £  (0,1)  and  vi,v2  are  distinct  homogeneous  Markov  policies,  i.e.  there  exists  at 
least  one  i0  €  <S  such  that  iq(-,io)  and  r>2(-,io)  differ  on  a  set  of  strictly  positive  measure. 
The  proof  of  the  next  lemma  closely  follows  that  of  [12,  Lemma  3.3];  we  therefore  present 
only  a  brief  sketch  of  the  proof. 

Lemma  4.5.  Let  v  be  as  in  (4.12).  Then  v[ir,£;v]  is  not  an  extreme  point  of  M2[ ",£]. 

Proof.  We  will  show  that  if  v  satisfies  (4.12),  then  there  are  homogeneous  Markov  policies 
V\,V2  and  b  £  (0, 1)  such  that 

v[x,£,-,v]  =  bv[K,Z\v  1]  +  (1  -  b)i/[ir,f;v2\ . 


It  suffices  to  find  b  £  (0, 1)  and  v\,v2  satisfying 

bip[ir ,  v{\(x,i)v\(x,  i)  +  (1  -  b)ip(n  ,f-,V2](xA)v2(x,i) 

b<p[v,Z;vi\(x,i)  +  (1  -  b)ip[r,t;v2}(x,i) 


v(x,i)  = 


(4.13) 


for  (x,i)  £  Rn  x  {5i[7t, £;  ui]  U  Si[7r,£;u2]}  (see  (4.5)).  Let  R  >  0  and  v[,v'2  homogeneous 
Markov  policies  defined  as 


vj(x,i ),  ||*|I  <  R 
v(x,i),  ||a;||  >  R 


i  £  S,  j  =  1,2. 


(4.14) 
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Let  v(-)  be  a  given  homogeneous  Markov  policy.  Define  a  homogeneous  Markov  policy  v" 
via 


v(x,i)  —  av[(x,  i)  +  (1  —  a)v2(x,i) 

_  b<p[x,f;v[](x,i)v[(x,i)  +  (1  -  b)(p{-K,f-,v}(x,i)v'f(x,i)  15 

bip[x,^v[](x,i)  +  (l -b)cp[TT,£;v](x,i) 

for  (x,i)  6  1RN  X  {‘S'i[7r,^;  v[)  U  ^[t,  £;n]}  and  arbitrary  otherwise.  The  arguments  used 
in  the  proof  of  [12,  Lemma  3.3]  mutalis  mutandis  will  ensure  a  suitable  choice  of  b  €  (0, 1) 
such  that  v2  is  a  genuine  homogeneous  Markov  policy.  Fix  a  b  £  (0,1)  as  in  (4.15).  Given 
a  homogeneous  Markov  policy  u(-)  we  obtain  another  v2(-)  via  (4.15).  Thus  we  have  a  map 
u[v,f;v]  i — r  T[7t,£;  v2]  from  ![*,£]  to  I[x,£].  Using  Lemma  4.6  it  can  be  shown  as  in  the 
proof  of  [11,  Lemma  3.3]  that  this  map  is  continuous  in  the  total  variation.  By  Schauder's 
fixed  point  theorem  [26,  p.  220]  this  map  has  a  fixed  point.  In  other  words,  there  exists  a 
homogeneous  Markov  policy  v2  such  that 

_  by[K,Z,-,v[)(x,i)v[(x,i)  +  (1  -  b)tp[ir,t\v"]v"{x,i) 

V  X,t  ~  bip[ir,Z;v,1](x,i)  +  (l-b)cp[ir,Z-,v'J]{x,i) 

for  (x,i)  6  1Rn  x  {Si[7r,£;vi]  U  ^[x,^; Since  v[  ^  v  on  a  set  of  strictly  positive 
measures  for  sufficiently  large  R,  v2  ^  v[  on  this  set.  Thus 

v[k^]v]  =  bv[x,t'-,v[]  +  (1  -  6)i/[jr,£;t#] 


as  desired.  □ 

Finally  all  the  results  in  this  section  now  are  summarized  as  follows. 

Theorem  4.1.  =  M2[ ir,f],  and  is  compact  and  convex,  and  each  of 

its  extreme  points  corresponds  to  some  z'fx^u]  where  v  is  a  homogeneous  Markov  non- 
ra ndomized  policy. 


5.  Existence  of  an  Optimal  Policy 

Using  the  results  of  the  previous  section,  we  will  establish  the  existence  of  an  optimal 
policy. 

Theorem  5.1.  There  exists  a  homogeneous  Markov  optimal  policy. 

Proof.  Let  (x,f)  €  T(MN)  X  T{S)  such  that  supp(x)  =  JRN  and  supp(£)  =  S.  Since  c  is 
bounded  and  continuous  the  map  M2\j r,£]  3  v  1 — *•  fcdi/  is  continuous.  Thus  there  exists 
a  homogeneous  Markov  policy  v*  such  that 

Jv*(x,C)  =  min {Jv(7T,g)  :  v  is  homogeneous  Markov}. 

V  V  1 

By  Lemma  4.2,  it  follows  that 

J»-0r,O  =  *W). 
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Therefore  v*  is  optimal  for  the  initial  law  (tt,£).  We  will  show  that  v*  is  optimal  for  any 
initial  law.  It  suffices  to  show  that  v*  is  optimal  for  any  initial  condition  (x,i)  E  EN  x  S. 
Suppose  there  exist  (x0,  i0  )  E  MN  X  S  and  a  homogeneous  Markov  policy  v  such  that 

^  J v*  (*^0 •,  2*0  )•  (5.1) 

Using  the  fact  that  the  solution  of  (2.4)  under  a  Markov  policy  is  a  Feller  process,  it  can  be 
easily  shown  that  the  function  Jv(x,i )  is  continuous  in  x  for  each  v.  Thus  (5.1)  holds  in  a 
neighborhood  B  of  zo-  Define  a  policy  v'  as  follows 

v'(t)  =  v*(X(t),S(t))l{X0  ?B}  +  v'(X(t),S(tj)I{X o  €  B} 

where  (X(-),  £(•))  is  governed  by  «'(•).  Note  that  under  ?/(•),  (2.4)  will  admit  a  weak 
solution;  the  proof  is  analogous  to  that  of  Theorem  2.1.  Then  it  is  easily  shown  that 


<A>'(lU£)  <  •/u*(lT,£) 


which  is  a  contradiction.  Thus  u*  is  optimal.  □ 

Theorem  5.2.  There  exists  a  homogeneous  Markov  non-randomized  optimal  policy. 

Proof.  Let  v*  be  as  in  Theorem  5.1.  Let  M|[7r,f]  be  the  set  of  extreme  points  of 

Since  M2[tt,£]  is  compact,  by  Choquet’s  theorem  [34],  u[x,f;v*]  is  the  barycenter  of  a 

probability  measure  m  supported  on  Mf[ 7r,£].  Therefore, 

f  cdu{ 7r, £;?;*]  =  f  (  f  cdp\m(dp).  (5.2) 

J  JMnPA  / 

Since  v*  is  optimal,  it  follows  from  (5.2)  that  there  exists  a  E  Af2e[7r,£]  such  that 

dv[ir,f-,v]. 

Thus  v  is  also  optimal.  By  Theorem  4.1  it  is  non-randomized.  □ 

6.  Dynamic  Programming  Equations 

Using  the  existence  results  of  the  previous  section,  we  will  now  derive  the  dynamic 
programming  or  Hamilton-Jacobi-Bellman  equations  (HJB)  which  in  our  case  will  be  a 
weakly  coupled  system  of  quasilinear  elliptic  equations,  and  then  characterize  the  optimal 
policy  as  a  minimizing  selector  of  an  appropriate  “Hamiltonian”.  The  HJB  equations  for 
our  problem  are 

a/tp(x,i)=  inf  [Luip(x,i) +  c(x,i,u)}.  (6.1) 

u€U 


J - 


17 


Theorem  6.1.  The  value  function  V(x,i)  is  the  unique  solution  of  (6.1)  in  the  space 
W^(]Rn  x  S)  n  Cb(mN  x  <S)  for  any  2  <  p  <  co . 


Proof.  We  have  already  seen  in  the  proof  of  Theorem  5.1  that  V(x,i)  6  Cb(lRN  X  5).  Let 
v*  be  a  homogeneous  Markov  non-randomized  optimal  policy  and  (W(-),  S'(-))  the  corre¬ 
sponding  solution  of  (2.4).  Then  for  (x,i)  €  MN  x  S 


V(x,i)  =  E 


_  —  Oi  t~ 


(X(t),S(t),v*(X(t),S(t)))  dt  X(0)  =  x,  5(0)  =  i 


(6.2) 


iyo  |  j 

By  standard  arguments  (see  the  arguments  following  (4.2)),  V(x,i)  is  the  unique  solution 
in  ^fc(MN  X  «S)  n  Cb(JRN  x  S)  for  any  2  <  p  <  oo  of 


aV(x,  i )  =  Lv'{x^V(x,  i )  +  c(x,  *,  v*(x,  i)) . 
Suppose  there  exist  zo  €  iR;V,  to  €  S,  u  £  U  and  S  >  0  such  that 


(6.3) 


aV(x0,i0)  >  LuV(x0,io)  +  c(x0,to,u)  +  6  . 


Then  by  the  continuity  of  W(-,  «o)  the  above  will  hold  in  a  neighborhood  N(x o)  of  zo-  Define 
a  homogeneous  Markov  non-randomized  policy  v  as  follows: 

(  v*(x,i)  if  (x,i)  IV(xo)  X  S 
v(x,i)  -  ^ 

[  u  if  (x,i)  £  iV( xo)  x  S. 

Then 

aV(x,  i0)  >  Lv(x^V(x,iq)  +  c(x,i0,v(x,i0))  +  6l{x  £  N(x 0)}. 

Now  it  is  easily  seen  that 

14(x,io)  ^  Jv ( *^ ,  io  )  T  S) 

for  some  S'  >  0,  which  is  a  contradiction.  Hence  V(x,i)  satisfies  (6.1).  Let-  V'  be  another 
solution  of  (6.1)  in  the  desired  class.  Then  it  can  be  shown  using  standard  arguments  (cf. 
[11,  Thm.  III. 2.4,  pp.  69-70])  that 

\V{x,i)-V'(x,i)\  <  2 Ke~at 


where  K  >  0  is  a  constant.  Letting  t  — *•  00,  V  =  V' .  □ 

Corollary  6.1.  Assume  that  for  each  i  £  S,  c(-,i,  •)  is  Lipschitz  in  its  first  argument 
uniformly  with  respect  to  the  third.  Then  V(x,i)  is  the  unique  solution  of  (6.1)  in  C2(MN  X 

S)  n  cb(mN  x  5). 

Proof.  It  suffices  to  show  that  V  is  C2 .  Since  V(x,i)  £  X  S )  for  any  2  <  p  <  00, 

by  Sobolev’s  imbedding  theorem  V(x,i )  £  Clj'y(MN  X  S)  for  0  <  7  <  1,  7  arbitrarily  close 
to  1,  and  hence  by  our  assumptions  on  b,  A,  c,  it  is  easy  to  see  that 

aV(x,i)  -  mfl^mj(x,i,u)dV^~^  +  ^  A ij(x,u)(V(xJ)  ~  V(x,i))  +c(x,i,u ) 

is  in  C0’'1'.  By  elliptic  regularity  [23,  p.  287]  applied  to  (6.1)  (V  replacing  ip),  we  conclude 
that  V  £  C 2n.  □ 
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Theorem  6.2.  A  homogeneous  Markov  non-randomized  policy  v  is  optimal  if  and  only  if 


f)V(r  i)  -  1 

y^Tfij(x,i,v(x,i)) — ~  +  Y  ^ik{x,v(x,i))  {V(x,k)  -V(x,i))  +  c(x,i,v(x,i)) 

j= i  Xj  k= 1 

=  ]nf  ^  '  ^  Alfe(x,u)(F(a;,A;)  -  H(x,  i)) 

“e  M=i  Xj  k=  l 

+  c(a;,f,u)  j  a.e.  x  €  iR‘V,  i  €  5  •  (6.4) 


Proof.  The  ‘necessity’  part  is  contained  in  the  proof  of  Theorem  6.1.  We  establish  the 
sufficiency.  Let  v(-,  •)  satisfy  (6.4).  The  existence  of  such  a  v  is  guaranteed  by  a  standard 
measurable  selection  theorem  [4,  Lemma  1].  Let  v'  be  any  other  homogeneous  Markov  non- 
randomized  policy.  Then  using  standard  arguments  involving  Ito’s  formula  and  the  strong 
Markov  property,  it  can  be  shown  that 

d v  (  r  ,  1 )  ^  J y 1  (  X  ,  l ) 

a.e.  x  €  Mn ,  i  €  S.  Hence  by  Lemma  4.2, 

Jy(x,i) 

for  any  admissible  policy  v.  Thus  v  is  optimal.  □ 

Remark  6.1.  Thus  far,  we  have  assumed  that  the  cost  function  c  is  bounded.  However,  this 
condition  can  be  relaxed,  as  we  show  in  the  Appendix. 


7.  An  Application  to  a  Simplified  Model 

We  consider  a  modified  version  of  the  model  studied  in  [2].  Suppose  there  is  one  machine 
producing  a  single  commodity.  Suppose  that  the  demand  rate  is  a  constant  d  >  0.  Let  the 
machine  state  S(t )  take  values  in  {0,1},  5(f)  =  0  or  1  according  as  the  machine  is  down  or 
functional.  Let  5(f)  be  a  continuous  time  Markov  chain  with  generator 

—  Aq  Ao 

Ai  —Ax 

The  inventory  X(t )  is  governed  by  the  Ito  equation 

dX (f)  =  (w(f)  —  d)  dt  +  a  dW (f)  (7.1) 


where  a  >  0.  The  production  rate  u(t )  is  constrained  by 


u{t)  = 


0  if  5(f)  =  0 

€[0,77]  if  5(f)  =  1. 
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Let  c  :  JR  — *■  iR+  be  the  cost  function  which  is  assumed  to  be  convex  and  Lipschitz  contin¬ 
uous.  Let  a  >  0  be  the  discount  factor  and  let  the  value  function  be  denoted  by  V(x,  i ).  In 
this  case  V(x,i)  is  the  minimal  non-negative  C 2  solution  of  the  HJB  equation 


4v"(m)- 


min 

u£[0,/J] 


dV'(x,  0) 

~d)V'(x,  1)} 


A0  +  a 
Ai 


-A0  1  /  V (x,  0)  \ 
a-Aj  V^(M)/ 


(7.2) 


Using  the  convexity  of  c(-)  it  can  be  shown  as  in  [2]  that  V(-,i)  is  convex  for  each  i.  Hence 
there  exists  an  such  that 


W(x,l)  <  0  for  x  <  x* 

>  0  for  x  >  x*. 

For  (7.2)  it  follows  that  the  value  of  u  which  minimizes  (u  —  d)V'(x,l)  is 


(7-3) 


f  R  if  x  <  x* 
u  =  < 

I  0  if  x  >  x*. 


At  x  =  x*,  T/'(x*,l)  =  0  and  therefore  any  u  £  [0,iZ]  minimizes  (u  —  d)V'(x,l).  Thus,  in 
view  of  Theorem  6.2,  we  can  choose  any  u  £  [0,i?]  at  x  —  x* .  To  be  specific,  we  choose 
u  =  d  at  x  =  x*.  It  follows  that  the  following  homogeneous  Markov  non-randomized  policy 
is  optimal 


u(a;,0)  =  0 


v(x,l) 


R 

d 

0 


if  x  <  x* 
if  x  =  x* 
if  x  >  x*. 


(7.4) 


We  note  at  this  point  that  the  piecewise  deterministic  model,  in  general,  would  lead  to  a 
singular  control  problem  when  V'(x,  1)  =  0  [2],  [27].  In  [2]  Akella  and  Kumar  have  obtained 
the  solution  of  the  HJB  equation  (this  would  be  (7.2)  without  the  second  order  term)  in 
closed  form  and  computed  an  explicit  expression  for  x* .  They  have  shown  that  a  policy  of 
the  type  (7.4)  is  optimal  among  all  homogeneous  Markov  non-randomized  policies.  In  our 
case  the  additive  noise  in  (7.1)  induces  a  smoothing  effect  to  remove  the  singular  situation; 
in  addition,  our  results  imply  that  the  policy  (7.4)  is  optimal  among  all  admissible  policies. 
The  only  limitation  of  our  model  is  that  it  would,  in  general,  be  very  difficult  to  solve  (7.2) 
analytically.  Therefore,  one  must  rely  on  numerical  methods  to  compute  our  optimal  policy 
of  the  type  (7.4). 

We  now  discuss  the  manufacturing  model  studied  in  [27]  as  described  in  the  introduction. 
The  machine  state  S(t)  is  again  a  prescribed  continuous  time  Markov  chain  taking  values 
in  S  =  {1  For  each  i  £  S,  the  production  rate  «  =  («!,..., u^)  takes  values  in  Ui 

which  is  a  convex  polyhedron  in  1RA  .  The  demand  rate  is  d  —  [d\, . . . ,  dN]T .  In  this  case, 
if  the  cost  function  c  :  JRN  JR+  is  Lipschitz  continuous  and  convex,  then  it  can  be  shown 
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that  for  each  i  €  S ,  the  value  function  V(-,i )  is  convex.  But  from  this  fact  alone  optimal 
policies  of  the  type  (7.4)  cannot  be  obtained.  However,  since  an  optimal  homogeneous 
Markov  non-randomized  policy  v(a :,i)  is  determined  by  minimizing 


N 

i=i 


dV{x,i) 

dxj 


over  Ui,  v(x,i )  takes  values  at  extreme  points  of  Ui.  Thus  for  each  machine  state  i,  an 
optimal  policy  divides  the  buffer  state  space  into  a  set  of  regions  in  which  the  production 
rate  is  constant.  If  the  gradient  W(x,i)  is  zero  or  orthogonal  to  a  face  of  Ui,  a  unique 
minimizing  value  does  not  exist.  But  again  in  view  of  Theorem  6.2  we  may  prescribe 
arbitrary  production  rates  at  those  points  where  Vh(a:,i)  =  0,  and  if  W(x,i)  is  orthogonal 
to  a  face  of  Ui,  we  can  choose  any  corner  of  that  face.  Hence  once  again  we  can  circumvent 
the  singular  situation. 


8.  Concluding  Remarks 

We  have  analyzed  the  optimal  control  of  switching  diffusions  with  a  discounted  criterion 
on  the  infinite  horizon.  The  model  allows  a  very  general  form  of  coupling  between  the 
continuous  and  the  discrete  components  of  the  process.  We  have  shown  that  there  exists 
a  homogeneous,  non-randomized  Markov  policy  which  is  optimal  in  the  class  of  all  admis¬ 
sible  policies.  Also,  the  existence  of  a  unique  solution  in  a  certain  class  to  the  associated 
Hamilton-Jacobi-Bellman  equations  is  established  and  the  optimal  policy  is  characterized 
as  a  minimizing  selector  of  an  appropriate  Hamiltonian. 

The  primary  motivation  for  this  study  is  a  class  of  control  problems  encountered  in 
flexible  manufacturing  systems.  By  explicitly  taking  into  account  the  noise  present  in  the 
dynamics,  we  are  able  to  remove  singularities  arising  in  the  noiseless  situation.  In  addition, 
we  show  that  hedging  type  policies  are  optimal  in  a  much  wider  class  of  non-anticipative 
policies  than  previously  considered.  We  have  confined  our  attention  to  the  flow  control 
level  only.  However,  our  results  can  be  used  to  study  control  problems  at  other  levels  in 
hierarchical  manufacturing  systems  [21],  as  well  as  control  problems  in  other  hybrid  systems 
(see,  e.g.,  [17],  [38],  [39]). 

Here  we  have  studied  only  the  discounted  criterion.  Following  [12],  we  can  obtain  similar 
results  for  the  finite  horizon  and  exit  time  criteria.  However,  the  long-run  average  cost 
problem  is  more  involved  and  is  currently  under  study. 
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Appendix 

Note  that  by  the  arguments  of  Section  5  we  can  establish  the  existence  of  a  homogeneous 
Markov  non-randomized  policy  for  each  fixed  initial  law.  The  independence  of  the  optimal 
policy  of  the  initial  law  is  obtained  by  the  dynamic  programming  characterization  of  the 
optimal  policy  via  Theorem  6.2.  Using  probabilistic  arguments,  dynamic  programming 
equations  can  be  derived  by  suitably  adapting  the  approach  in  [11,  Chap.  3].  However,  we 
will  present  in  a  brief  sketch  an  alternative  analytical  approach,  which  parallels  that  used 
for  classical  diffusions  in  [29]. 

We  assume  that  for  each  i  €  S,  c(-,i,  •)  is  Lipschitz  in  its  first  argument  uniformly 
with  respect  to  the  third.  We  further  assume  that  for  each  *  £  ]RN ,  i  £  5,  the  value 
function  V{x,i )  <  oo.  (This  assumption  may  be  replaced  by  some  ergodicity  hypotheses 
of  the  process  under  some  homogeneous  Markov  policy.)  Let  Br  =  {*  £  MN  :  ||*||  <  1?}. 
Consider  the  Dirichlet  problem  on  Bn 


inf  Lutp(x ,  i)  —  atp(x,  i ) , 

u£U 


cp{x,i) 


=  0. 

dBR 


in  Br  x  S 


(A.l) 


The  existence  of  a  unique  solution  (pR(x,i)  of  (A.l)  in  W^(MN  x  S),  2  <  p  <  oo,  is 
guaranteed  by  [31,  Thm.  5.1,  p.  422],  Thus  to  each  R  >  0  we  have  a  solution  to 
(A.l)  belonging  to  W^(]RN  x  S)  for  2  <  p  <  oo.  Using  elliptic  regularity  results  as  in 
Corollary  6.1,  it  follows  that  (pR(x,i )  €  C2,1(Br  X  S ),  0  <  7  <  1,  7  arbitrarily  close  to  1. 
Let  vr  be  a  homogeneous  Markov  non-randomized  policy  which  is  a  minimizing  selector  in 
(6.4).  Then  using  standard  arguments  involving  Ito’s  formula  it  can  be  shown  that 


(pR(x,i)  =  E 


rrR 


e-atc 


inf  E 

u{-) 


(X(t),S(t),vR(X(t),S(t)))  dt  |  X(0)  =  *,5X0)  =  * 
e~atc{X{t),S{t),u{t ))  dt  I  X(0)  =  *,5(0)  =  i 


(A- 2) 
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where  tr  is  the  hitting  time  of  8Br  of  the  process  2f(-).  Clearly  <pR(x,i )  <  V(x,i)  and 
it  can  be  easily  seen  from  (A.2)  that  <pR(x,i)  is  increasing  in  R.  Let  R'  >  R.  Then  by 
the  interior  estimates  [31,  pp.  398-402]  {<£r'}r'>r  is  bounded  in  Br  uniformly  in  R'  and 
{VcpRi  }r<>r  is  bounded  in  W1,2(BrxS)  uniformly  in  R'.  By  Sobolev’s  imbedding  theorem 
W1,2(Br  X  S)  c->  L2+s(Br  X  S)  for  some  s  >  0.  Then  by  suitably  modifying  (4.10)  of  [31, 
p.  400],  we  have 

\\VR’\\w*'1+‘(BrxS)  <  Kr 

where  Kr  is  a  constant  which  does  not  depend  on  R' .  (The  modification  is  needed  because 
of  the  factor  e  >  0,  but  it  would  be  routine.  Repeating  the  above  procedure  over  and 
over  again  we  conclude  that  {^>R'}R’yR  is  uniformly  bounded  in  W2,P(BR )  for  2  <  p  <  oo. 
Since  W2,P(BR )  <-*•  W1,p(Br )  and  the  injection  is  compact,  it  follows  that  {<+>r}  converges 
strongly  in  W1,P(BR).  Thus  given  any  sequence  {Rn},  Rn  —*  oo  as  n  — v  co  and  for  any  fixed 
integer  N  >  2,  we  can  choose  a  subsequence  {Rni}  such  that  {¥,r„.}  converges  strongly 
in  W1’p(Bn-i).  Using  a  suitable  diagonalization  we  may  assume  that  {ipR„.}  converges 
strongly  in  W1<p(Bn- 1)  for  each  integer  N  >  2.  Let  ip  be  a  limit  point  of  It  can 

then  be  shown  as  in  [5,  p.  148]  (see  also  [31,  p.  420])  that 

inf r\y]^k(x,j,u)d<pR^  +  VI je(x,u)(<pR  (x,£)  -  <pRn  (x,j))  +  c(x,j,u)\ 

ueu^i  dXkfr[  J 

— +  igf +  Y^Jje(x,u)(ip(x,£)  -  ip(x,j ))  +  c(x,j»| 
n,  — *co  ueU{£~^  c)xk  J 

strongly  in  Lp(Bn-i).  Then  ip  £  W^P(MN  X  S)  and  ip  satisfies 


inf Luip(x,i)  —  aip(x,i) 


in  V\IRn  X  S ),  i.e.  in  the  sense  of  distributions.  By  elliptic  regularity  ip  £  W^(MN  X  5), 
2  <  p  <  oo.  Therefore  as  in  Corollary  6.1,  it  follows  that  ip  £  C 2’~I(1RN  x  S ),  0  <  7  <  1,  7 
arbitrarily  close  to  1.  Let  v  be  a  minimizing  selector  corresponding  to  ip.  Then  by  standard 
arguments  involving  Ito’s  formula  it  can  be  shown  that 


ip(x,i )  =  E 


-cut- 


=  inf  E 
«(■) 


e  dt  |  X(0)  =  x,  5(0)  =  i 

e~atc[X(t),  dt  X(0)  =  x,5(0)  =  i 


Thus  ip(x,i )  =  V(x,i).  In  this  situation  (6.1)  does  not  have  a  unique  solution  in  general, 
but  V(x,i)  can  be  identified  as  a  minimal  nonnegative  solution  of  (6.1)  in  C2(Mn  x  S). 
The  assertion  of  Theorem  6.2  is  also  valid  in  this  case. 
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